{
  "nbformat": 4,
  "nbformat_minor": 4,
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.12"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Import pre-annotations"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "When you have a data pipeline with models outputting predictions, you want to be able to import those predicted annotations, or pre-annotations, into Label Studio for review and correction.\n",
        "\n",
        "In this example, use the [Label Studio SDK](https://labelstud.io/sdk/index.html) to write a Python script that transforms and imports predictions into a Label Studio project."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Connect to Label Studio"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Connect to the Label Studio API using the Client module of the SDK:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "ExecuteTime": {
          "end_time": "2024-07-09T09:13:46.006673Z",
          "start_time": "2024-07-09T09:13:44.982373Z"
        }
      },
      "execution_count": 1,
      "outputs": [],
      "source": [
        "from label_studio_sdk.client import LabelStudio\n",
        "\n",
        "LABEL_STUDIO_URL = 'http://localhost:8080'\n",
        "# find your key at Account & Settings -> Access Token\n",
        "API_KEY = '27c982caa9e599c9d81b25b00663e7d4f82c9e3c'\n",
        "\n",
        "ls = LabelStudio(base_url=LABEL_STUDIO_URL, api_key=API_KEY)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Create a project"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Create a project for your labels. In this example, create an [image classification](https://labelstud.io/templates/image_classification.html) project:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "ExecuteTime": {
          "end_time": "2024-07-09T09:15:24.191827Z",
          "start_time": "2024-07-09T09:15:24.115609Z"
        }
      },
      "execution_count": 2,
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "<View>\n",
            "  <Image name=\"image\" value=\"$image\"/>\n",
            "  <Choices name=\"image_class\" toName=\"image\">\n",
            "    <Choice value=\"Cat\"/>\n",
            "    <Choice value=\"Dog\"/>\n",
            "  </Choices>\n",
            "</View>\n"
          ]
        }
      ],
      "source": [
        "from label_studio_sdk.label_interface import LabelInterface\n",
        "from label_studio_sdk.label_interface.create import choices\n",
        "\n",
        "label_config = LabelInterface.create({\n",
        "    'image': 'Image',\n",
        "    'image_class': choices(['Cat', 'Dog'])\n",
        "})\n",
        "print(label_config)\n",
        "\n",
        "project = ls.projects.create(\n",
        "    title='Project Created from SDK: Image Preannotation',\n",
        "    label_config=label_config\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Import tasks with pre-annotations"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "You can import tasks with pre-annotations in several different ways. Choose one of these three methods for your script, based on how your model predictions are formatted."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### 1. Import tasks in Label Studio JSON format"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "You can format tasks in basic [JSON Label Studio format](https://labelstud.io/guide/tasks.html#Basic-Label-Studio-JSON-format) and choose to import the pre-annotations that way."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "scrolled": true,
        "ExecuteTime": {
          "end_time": "2024-07-09T09:17:03.178950Z",
          "start_time": "2024-07-09T09:17:03.124176Z"
        }
      },
      "execution_count": 5,
      "outputs": [
        {
          "data": {
            "text/plain": "ProjectsImportTasksResponse(task_count=2, annotation_count=0, predictions_count=None, duration=0.031484127044677734, file_upload_ids=[], could_be_tasks_list=False, found_formats=[], data_columns=[], prediction_count=2)"
          },
          "execution_count": 5,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "ls.projects.import_tasks(\n",
        "    project.id,\n",
        "    request=[{\n",
        "        'data': {'image': 'https://data.heartex.net/open-images/train_0/mini/0045dd96bf73936c.jpg'},\n",
        "        'predictions': [{\n",
        "            'result': [{\n",
        "                'from_name': 'image_class',\n",
        "                'to_name': 'image',\n",
        "                'type': 'choices',\n",
        "                'value': {\n",
        "                    'choices': ['Dog']\n",
        "                }\n",
        "            }],\n",
        "            'score': 0.87\n",
        "        }]\n",
        "    }, {\n",
        "        'data': {'image': 'https://data.heartex.net/open-images/train_0/mini/0083d02f6ad18b38.jpg'},\n",
        "        'predictions': [{\n",
        "            'result': [{\n",
        "                'from_name': 'image_class',\n",
        "                'to_name': 'image',\n",
        "                'type': 'choices',\n",
        "                'value': {\n",
        "                    'choices': ['Cat']\n",
        "                }\n",
        "            }],\n",
        "            'score': 0.65\n",
        "        }]\n",
        "    }]\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "> If you're not importing predictions for an image classification task, see the documentation for [importing pre-annotations](https://labelstud.io/guide/predictions.html) in Label Studio JSON format for more examples."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### 2. Import simple JSON predictions"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This simpler JSON format is a way to import pre-annotation results from a specific field for a single image. In this case, import task data with predictions in the `pet` field, and specify that the `pet` field contains the predicted classification:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "ExecuteTime": {
          "end_time": "2024-07-09T09:17:33.852323Z",
          "start_time": "2024-07-09T09:17:33.804344Z"
        }
      },
      "execution_count": 6,
      "outputs": [
        {
          "data": {
            "text/plain": "ProjectsImportTasksResponse(task_count=2, annotation_count=0, predictions_count=None, duration=0.027251005172729492, file_upload_ids=[], could_be_tasks_list=False, found_formats=[], data_columns=[], prediction_count=2)"
          },
          "execution_count": 6,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "ls.projects.import_tasks(\n",
        "    project.id,\n",
        "    request=[{'image': f'https://data.heartex.net/open-images/train_0/mini/0045dd96bf73936c.jpg', 'pet': 'Dog'},\n",
        "             {'image': f'https://data.heartex.net/open-images/train_0/mini/0083d02f6ad18b38.jpg', 'pet': 'Cat'}],\n",
        "    preannotated_from_fields=['pet']\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### 3. Import predictions from CSV"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "If your predictions are stored in CSV files, you can use pandas to read the dataframes:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "execution_count": 1,
      "outputs": [],
      "source": [
        "!pip install pandas"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "scrolled": true
      },
      "execution_count": 1,
      "outputs": [],
      "source": [
        "import pandas as pd\n",
        "df = pd.read_csv('data/images.csv')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Then you can specify the CSV file with the image references, and specify that the `pet` field contains the predicted classification:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "scrolled": true
      },
      "execution_count": 1,
      "outputs": [],
      "source": [
        "ls.projects.import_tasks(\n",
        "    project.id,\n",
        "    request=df.to_dict(orient='records'),\n",
        "    preannotated_from_fields=['pet']\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### 4. Import predictions to existing tasks"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "In some cases, you may want to apply predictions to already imported tasks. For example, you can retrieve tasks from Label Studio, then create a new prediction:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "execution_count": 1,
      "outputs": [],
      "source": [
        "from label_studio_sdk.label_interface.objects import PredictionValue\n",
        "\n",
        "li = ls.projects.get(id=project.id).get_label_interface()\n",
        "\n",
        "for task in ls.tasks.list(project=project.id, include=['id']):\n",
        "    prediction = PredictionValue(\n",
        "        # Tag predictions with specific model version\n",
        "        model_version='my_model_v1',\n",
        "        # Define your labels here\n",
        "        result=[\n",
        "            li.get_control('image_class').label(['Dog']),\n",
        "        ]\n",
        "    )\n",
        "    ls.predictions.create(task=task.id, **prediction.model_dump())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "In more complex annotation scenarios, check out [JSON format for expected predictions / preannotations](https://labelstud.io/guide/predictions.html)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Annotate tasks in Label Studio"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "After importing pre-annotations using your preferred method, you can open Label Studio at http://localhost:8080 and correct, review the predictions, and finish annotating your tasks."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Calculate prediction accuracy"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "If you're using Label Studio Enterprise, you can calculate the accuracy of your predictions compared to the corrected annotations created as a ground truth dataset.\n",
        "\n",
        "Install and import the [evalme package](https://github.com/heartexlabs/label-studio-evalme) and calculate an agreement score for each task, comparing the annotation to the prediction:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "execution_count": 1,
      "outputs": [],
      "source": [
        "!pip install label-studio-evalme"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "execution_count": 1,
      "outputs": [],
      "source": [
        "from evalme.metrics import get_agreement\n",
        "\n",
        "print('Pre-annotation agreement scores:')\n",
        "\n",
        "total_score = 0\n",
        "n = 0\n",
        "for task in ls.tasks.list(project=project.id):\n",
        "    score = get_agreement(task.annotations[0], task.predictions[0])\n",
        "    print(f'{task.id} ==> {score}')\n",
        "    total_score += score\n",
        "    n += 1\n",
        "\n",
        "print(f'Pre-annotation accuracy: {100 * total_score / n: .0f}%')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Conclusion"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "With the Label Studio SDK, you can more easily import pre-annotated tasks into Label Studio so that you can create ground truth datasets, visually review the accuracy of predictions, and more.\n",
        "\n",
        "The `preannotated_from_fields` option for the `import_tasks()` method makes it easier to add your predictions without worrying about the intricacies of the Label Studio JSON format, but you can still use that field to add valuable metadata such as prediction scores and model versions to your pre-annotated task data."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "collapsed": false
      },
      "execution_count": null,
      "outputs": [],
      "source": []
    }
  ]
}
